Mixed methods
\(ATE = \lambda^Y_{01} - \lambda^Y_{10}\)
We need to
Key insight:
That, with priors, is enough to update:
\[p(\lambda | D) = \frac{p(D | \lambda)p(\lambda)}{p(D)}\] ## “By hand”
Let’s update manually.
Consider this joint distribution with binary \(X\) and binary \(Y\) from here
| Y = 0 | Y = 1 | |
|---|---|---|
| X = 0 | \(b/2 + c/2\) | \(a/2 + d/2\) |
| X = 1 | \(a/2 + c/2\) | \(b/2 + d/2\) |
reminder: \(a\) is share with negative effects, \(b\) is share with positive effects…
Say we now had (finite) data filling out this table. What posteriors should we form over \(a,b,c,d\)?
| Y = 0 | Y = 1 | |
|---|---|---|
| X = 0 | \(n_{00}\) | \(n_{01}\) |
| X = 1 | \(n_{10}\) | \(n_{11}\) |
Lets start with a flat prior over the shares and then update over possible shares based on the data.
This time we will start with a draw of possible shares and put look for posterior weights on each drawn share.
\[ \Pr(n_{00}, n_{01}, n_{10}, n_{11} \mid a,b,c,d) = f_{\text{multinomial}}\left( \alpha_{00}, \alpha_{01}, \alpha_{10}, \alpha_{11} \mid \sum n, w \right) \] where:
\[w = \left(\frac12(b + c), \frac12(a+d), \frac12(a+c), \frac12(b+d)\right)\]
why multinomial?
prior draw with 10000 possibilities:
x <- gtools::rdirichlet(10000, alpha = c(1,1,1,1)) |> as.data.frame()
names(x) <- letters[1:4]
x |> head() |> kable(digits = 3)| a | b | c | d |
|---|---|---|---|
| 0.296 | 0.173 | 0.528 | 0.003 |
| 0.504 | 0.207 | 0.286 | 0.003 |
| 0.106 | 0.378 | 0.129 | 0.387 |
| 0.634 | 0.020 | 0.142 | 0.204 |
| 0.479 | 0.093 | 0.184 | 0.244 |
| 0.266 | 0.071 | 0.516 | 0.147 |
each row sums to 1; each point (row) lies on a simplex
Imagine we had data (number of units with given values of X and Y):
\(n_{00} = 400, n_{01} = 100, n_{10} = 100, n_{11} = 400\)
Difference in means = .6.
Then:
x |>
mutate(likelihood = formatC(likelihood, format = "e", digits = 2),
posterior = formatC(posterior, format = "e", digits = 2)) |>
head() |>
kable(digits = 2)| a | b | c | d | likelihood | posterior |
|---|---|---|---|---|---|
| 0.30 | 0.17 | 0.53 | 0.00 | 2.10e-212 | 1.72e-209 |
| 0.50 | 0.21 | 0.29 | 0.00 | 1.26e-221 | 1.03e-218 |
| 0.11 | 0.38 | 0.13 | 0.39 | 7.80e-46 | 6.38e-43 |
| 0.63 | 0.02 | 0.14 | 0.20 | 0.00e+00 | 0.00e+00 |
| 0.48 | 0.09 | 0.18 | 0.24 | 1.97e-231 | 1.61e-228 |
| 0.27 | 0.07 | 0.52 | 0.15 | 3.99e-194 | 3.26e-191 |
Spot the ridge
Data on exogenous variables and a key outcome for many cases
E.g., data on inequality (\(I\)) and democracy (\(D\)) for many cases
CausalQueries uses information wherever it finds itFor Bayesian approaches this mixing is not hard.
Critically though we maintain the assumption that cases for “in depth” analysis are chose at random—otherwise we have to account for selection processes.
What is the probability of seeing these two cases:
given parameters \(\lambda\):
The probability of 1 is:
\[p_{111}= \lambda^X_1 \times (\lambda^M_{01} + \lambda^M_{11}) \times (\lambda^Y_{01} +\lambda^Y_{11})\]
The probability of 2 is:
\[p_{1?1} = \lambda^X_1\times \left((\lambda^M_{01} + \lambda^M_{11}) \times (\lambda^Y_{01} +\lambda^Y_{11}) + (\lambda^M_{10} + \lambda^M_{00}) \times (\lambda^Y_{10} +\lambda^Y_{11}) \right)\]
So the probability of this data is just:
\[p(D|\lambda) = p_{111} * p{1?1}\]
Insight:
If we imagine possible parameter values we can figure out the likeihood of any data type – quantitative, qualitative or mixed.
That, with priors, is enough to update:
\[p(\lambda | D) = \frac{p(D | \lambda)p(\lambda)}{p(D)}= \frac{p(D | \lambda)p(\lambda)}{\int_{\lambda'}p(D|\lambda')p(\lambda')d\lambda'}\]
Remember:
Remember
Suppose we go to the field and we learn that mass mobilization DID occur in Malawi
What can we conclude?
NOTHING YET!
CausalQueriesCausalQueries brings these elements together by allowing users to:
CausalQueries figures out all principal strata and places a prior on theseCausalQueries writes a stan model and updates on all parametersCausalQueries figures out which parameters correspond to a given causal queryConsider this problem:
| Y = 0 | Y = 1 | |
|---|---|---|
| X = 0 | \(n_{00}\) | \(n_{01}\) |
| X = 1 | \(n_{10}\) | \(n_{11}\) |
where \(X\) is randomized, both \(X\), \(Y\) binary
posterior_distribution
Summary statistics of model parameters posterior distributions:
Distributions matrix dimensions are
4000 rows (draws) by 6 cols (parameters)
mean sd
X.0 0.48 0.02
X.1 0.52 0.02
Y.00 0.28 0.07
Y.10 0.12 0.07
Y.01 0.50 0.07
Y.11 0.11 0.07
Posterior draws
The CausalQueries approach generalizes to settings in which nodes are categorical:
stan to figure out \(\Pr(\lambda | \text{Data})\)…where dotted lines means that the response types for two nodes are not independent
Example of an IV model. What are the principle strata (response types)? What relations of conditional independence are implied by the models?
| event | strategy | count |
|---|---|---|
| Z0X0Y0 | ZXY | 158 |
| Z1X0Y0 | ZXY | 52 |
| Z0X1Y0 | ZXY | 0 |
| Z1X1Y0 | ZXY | 23 |
| Z0X0Y1 | ZXY | 14 |
| Z1X0Y1 | ZXY | 12 |
| Z0X1Y1 | ZXY | 0 |
| Z1X1Y1 | ZXY | 78 |
Note that in compact form we simply record the number of units (“count”) that display each possible pattern of outcomes on the three variables (“event”).[^1]
Queries can be condition on observable or counterfactual quantities
| query | given | mean | sd | cred.low.2.5% | cred.high.97.5% |
|---|---|---|---|---|---|
| Y[X=1] - Y[X=0] | - | 0.55 | 0.10 | 0.37 | 0.73 |
| Y[X=1] - Y[X=0] | X==0 & Y==0 | 0.64 | 0.15 | 0.37 | 0.89 |
| Y[X=1] - Y[X=0] | X[Z=1] > X[Z=0] | 0.70 | 0.05 | 0.59 | 0.80 |